Skip to content

Filesystem only snapshots#2995

Merged
bchalios merged 8 commits into
mainfrom
feat/fs-only-snapshots-pause
Jun 16, 2026
Merged

Filesystem only snapshots#2995
bchalios merged 8 commits into
mainfrom
feat/fs-only-snapshots-pause

Conversation

@bchalios

Copy link
Copy Markdown
Contributor

What

Adds an opt-in snapshot mode that persists only the filesystem (rootfs) and skips the guest memory snapshot. Resuming such a snapshot cold-boots (reboots) a fresh VM from the rootfs instead of restoring memory.

  • Sandbox.Pause(WithFilesystemSnapshot()) — pauses without a memory snapshot; uploads only the rootfs + metadata (no memfile/snapfile).
  • Factory.RebootSandbox — the resume primitive that cold-boots from a snapshot's rootfs, marked routable only once the guest is ready.
  • Cold boot now writes the envd MMDS access-token hash, so rebooted secure sandboxes authenticate the same way memory-resumed ones do.
  • cmd/resume-build gains -fs-only and -reboot to exercise the full cycle locally.

Normal memory snapshots are unchanged — this mode is strictly opt-in.

Why

Memory snapshots are the expensive part of a pause (large artifacts, slow upload, more storage). Many workloads don't need warm RAM restored on resume — they only need their disk state. A filesystem-only snapshot is cheaper and smaller to take and store, and also serves as an escape hatch when memory artifacts are slow, oversized, or missing. The trade-off is explicit: a reboot loses RAM, running processes, connections, and unsynced writes; only the synced filesystem survives.

Scope / not in this PR

Orchestrator-only. No public API/proto/SDK wiring yet — the reboot path is reachable via resume-build but not yet over gRPC. API-side resume selection and auto-resume/connect gating (so a disk-only snapshot is never silently rebooted) are follow-ups. Guest cold-boot speedups live on a separate branch.

Testing

Verified to build/lint clean per commit. End-to-end pause→reboot validation (synced files persist, uptime resets, correct on-storage layout) is done locally via resume-build; secure-sandbox envd auth after reboot still needs validation against a deployment.

@cla-bot cla-bot Bot added the cla-signed label Jun 12, 2026
@cursor

cursor Bot commented Jun 12, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes pause, upload, and start paths and makes fs-only pause depend on guest sync for durability; production gRPC resume is unchanged until follow-up wiring selects reboot.

Overview
Adds an opt-in pause mode that persists only rootfs: Pause accepts WithFilesystemSnapshot(), runs a mandatory guest sync before snapshotting, skips memfile export, and upload omits memfile and snapfile while keeping disk flush via CreateSnapshot. Resume for those builds is a new RebootSandbox cold boot from the snapshot rootfs (fresh RAM, routable only after envd), with Firecracker Create writing MMDS access-token metadata on that path so envd auth matches memory resume. resume-build exposes -fs-only and -reboot for local testing; scheduling and template reload tolerate missing memfile. Full memory snapshots stay the default.

Reviewed by Cursor Bugbot for commit 6a95194. Bugbot is set up for automated code reviews on this repo. Configure here.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

No critical findings or correctness issues were identified in the code changes, and no review comments were provided. I have no feedback to provide.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@bchalios bchalios force-pushed the feat/fs-only-snapshots-pause branch from f669bc7 to c609e2c Compare June 12, 2026 15:58
@bchalios bchalios marked this pull request as ready for review June 12, 2026 17:01

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c609e2c116

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

RootfsDroppedBuilds: uint32(rootDropped),
}

if memfileHeader != nil && memfileHeader.Metadata != nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve rootfs metadata after fs-only reload

This nil-memfile path only works while Pause still has the rootfs header in hand. Once a filesystem-only snapshot is uploaded and later reloaded, there is no memfile header/object, but storageTemplate.SchedulingMetadata still calls t.memfile.WaitWithContext and returns nil on that error before reading rootfs (packages/orchestrator/pkg/sandbox/template/storage_template.go:278-282). In contexts that report scheduling metadata from a loaded template, filesystem-only builds therefore lose even the rootfs affinity data; update that provider to tolerate the missing memfile and call FromHeaders with a nil memfile header and the resolved rootfs header.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed! storageTemplate.SchedulingMetadata now reports rootfs-only scheduling metadata for reloaded fs-only templates instead of dropping it.

@bchalios bchalios force-pushed the feat/fs-only-snapshots-pause branch from c609e2c to b2273c7 Compare June 12, 2026 17:45
Comment thread packages/orchestrator/pkg/sandbox/reclaim.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/sandbox.go
Comment thread packages/orchestrator/pkg/sandbox/reboot.go
@bchalios bchalios force-pushed the feat/fs-only-snapshots-pause branch from b2273c7 to d2c38fc Compare June 15, 2026 11:10
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d2c38fc. Configure here.

Comment thread packages/orchestrator/pkg/sandbox/template/storage_template.go

const (
// minEnvdVersionForKVMClock is the minimum envd version that supports kvm-clock.
minEnvdVersionForKVMClock = "0.2.11"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. It appears that this is duplicated with create_sandbox.go. Not that it's likely to be updated, but could be extracted into a single const.

func (s *Sandbox) guestSync(ctx context.Context) error {
syncTimeout := s.guestSyncTimeout(ctx)

ctx, span := tracer.Start(ctx, "envd-guest-sync")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we like a metric here as well to tell how long it usually takes?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, do we have enough data to separate the regular pause from fs-only in our metrics?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a metric for guest sync latency and span attribute to separate between full and filesystem-only snapshots.

@bchalios bchalios force-pushed the feat/fs-only-snapshots-pause branch from d2c38fc to dc9e2ea Compare June 15, 2026 17:25
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

bchalios and others added 8 commits June 16, 2026 10:44
Add Sandbox.Pause(WithFilesystemSnapshot()) to produce a snapshot that
persists only the rootfs and skips the guest memory snapshot. The default
remains a full memory snapshot; fs-only is strictly opt-in.

When enabled, before pausing the VM a mandatory guest sync is run via envd
(a failed sync fails the pause, since no memory snapshot preserves the page
cache and the rootfs would otherwise be missing acknowledged writes), and
memory prefetch is cleared. CreateSnapshot is still called for its disk
drain+flush side effect, but the memfile diff is left empty (NoDiff) with a
resolved-nil header.

scheduling.FromHeaders now tolerates a nil memfile header, emitting
rootfs-only scheduling metadata instead of dropping the metadata entirely.

Snapshot.FilesystemSnapshot records the decision at pause time: it can't be
inferred from the diff shape, since a memory snapshot with zero dirty pages
also yields a NoDiff memfile but still needs its snapfile uploaded.

Add -fs-only flag in cmd/resume-build to allow testing the operation
locally. Snapshots taken with -fs-only should include only rootfs
(meta)data.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
…ride flag

The mandatory pre-pause guest sync used a fixed 5s deadline, which is too
short under IO contention — a failed sync fails the (filesystem-only) pause.
The dirty page cache a sync must flush is bounded by guest RAM (pages already
written back are not re-flushed), so scale the deadline by RamMB against a
pessimistic 50 MiB/s flush-throughput floor, clamped to [5s, 2min].

Add the guest-sync-timeout-milliseconds feature flag as an operational
override: a positive value pins the deadline regardless of RAM; 0 (default)
keeps the RAM-derived value. Unit tests cover the scaling/clamping and the
flag override/fallback. Addresses PR review feedback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
…pshot

Export the memorySnapshot helper as MemorySnapshot and embed it in Snapshot
as a single field, replacing the flat MemfileDiff / MemfileDiffHeader /
MemfileBlockSize fields. The Diff, DiffHeader, and BlockSize fields are
exported (read cross-package by the upload, layer, and server paths); the
header and newBytes scheduling inputs stay unexported as they are used only
at Pause time.

Pure rename/restructure: no behavioral change. Consumers updated to read
snap.MemorySnapshot.Diff / .DiffHeader / .BlockSize.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
A filesystem-only snapshot now uploads only rootfs.ext4(.header) and
metadata.json. The snapfile upload is skipped in both the V3 and V4 paths
(it is still created during pause for its disk drain+flush side effect, just
not persisted), and NewUpload skips resolving the memfile compress config —
which would otherwise fail validation on the zero memfile block size when
compression is enabled.

The memfile body and header need no change: NoDiff.CachePath returns an empty
path (body self-skips) and the resolved-nil diff header makes both upload
paths return early. Absence of the memory artifacts on storage is the on-disk
signal that a snapshot is filesystem-only.

Memory snapshots are unaffected (FilesystemSnapshot is false).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
RebootSandbox cold-boots a fresh Firecracker VM from a snapshot's rootfs
without restoring guest memory — the resume primitive for filesystem-only
snapshots. It masks an empty memfile (sizing NoopMemory only), selects the
NBD provider (empty rootfs cache path) so guest TRIM and overlay chaining
work like a normal resume, uses the Sync IO engine, and boots via systemd
init. The sandbox is marked running only after envd is ready, matching
ResumeSandbox's routing guarantee; the caller's absolute end time is honored
so queue delay can't extend the TTL.

To support the deferred mark-running, CreateSandbox gains a
WithDeferredMarkRunning option (cold boot otherwise marks running before envd
is up). A StartTypeReboot start type is added, and the ext4 rootflags are
pulled into a named constant documenting that "noload" must never be set —
fs-only resume relies on ext4 replaying its journal on cold boot.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
A memory resume calls setMmds with the access-token hash (fc/process.go
Resume) so the guest envd can authenticate /init against the MMDS hash. Cold
boot (Process.Create) only configured the MMDS transport and never wrote the
data, so a rebooted secure sandbox would start envd with no hash and /init
would fall into the unauthenticated first-time-setup branch.

Process.Create now writes the MMDS metadata before startVM when
ProcessOptions.AccessToken is set, mirroring Resume. RebootSandbox always
sets it; an empty token hashes to the "no token" value, matching Resume's
behavior for non-secure sandboxes. Both the MMDS hash and the /init token
sent by WaitForEnvd are sourced from config.Envd.AccessToken, so they always
agree. Template-build cold boots leave AccessToken nil and are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add a -reboot flag that cold-boots from the build's rootfs via
Factory.RebootSandbox instead of resuming from the memory snapshot, routed
through a new runner.startSandbox helper used by the resume, interactive,
cmd, and pause paths. Lets the filesystem-only pause + reboot resume cycle be
exercised end to end locally (e.g. `resume-build -fs-only -cmd-pause '...'`
then `resume-build -reboot -cmd 'cat /proc/uptime'`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add a metric for measuring the latency of syncing the guest filesystem
before taking a snapshot. Also, add a span attribute in
`sandbox-snapshot` span that shows when a Pause() operation is on behalf
of a filesystem-only snapshot.

Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
@bchalios bchalios force-pushed the feat/fs-only-snapshots-pause branch from dc9e2ea to 6a95194 Compare June 16, 2026 08:47
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@bchalios bchalios merged commit 62aa387 into main Jun 16, 2026
54 checks passed
@bchalios bchalios deleted the feat/fs-only-snapshots-pause branch June 16, 2026 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants